Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 24
Filter
1.
Viruses ; 16(4)2024 Apr 03.
Article in English | MEDLINE | ID: mdl-38675899

ABSTRACT

Lumpy skin disease virus (LSDV) is a member of the capripoxvirus (CPPV) genus of the Poxviridae family. LSDV is a rapidly emerging, high-consequence pathogen of cattle, recently spreading from Africa and the Middle East into Europe and Asia. We have sequenced the whole genome of historical LSDV isolates from the Pirbright Institute virus archive, and field isolates from recent disease outbreaks in Sri Lanka, Mongolia, Nigeria and Ethiopia. These genome sequences were compared to published genomes and classified into different subgroups. Two subgroups contained vaccine or vaccine-like samples ("Neethling-like" clade 1.1 and "Kenya-like" subgroup, clade 1.2.2). One subgroup was associated with outbreaks of LSD in the Middle East/Europe (clade 1.2.1) and a previously unreported subgroup originated from cases of LSD in west and central Africa (clade 1.2.3). Isolates were also identified that contained a mix of genes from both wildtype and vaccine samples (vaccine-like recombinants, grouped in clade 2). Whole genome sequencing and analysis of LSDV strains isolated from different regions of Africa, Europe and Asia have provided new knowledge of the drivers of LSDV emergence, and will inform future disease control strategies.


Subject(s)
Genome, Viral , Lumpy Skin Disease , Lumpy skin disease virus , Phylogeny , Whole Genome Sequencing , Lumpy skin disease virus/genetics , Lumpy skin disease virus/classification , Lumpy skin disease virus/isolation & purification , Animals , Lumpy Skin Disease/virology , Lumpy Skin Disease/epidemiology , Cattle , Africa, Central/epidemiology , Africa, Western/epidemiology , Disease Outbreaks
2.
Microbiol Resour Announc ; 13(4): e0006724, 2024 Apr 11.
Article in English | MEDLINE | ID: mdl-38526091

ABSTRACT

African swine fever virus causes a lethal hemorrhagic disease of domestic pigs. The NAM P1/1995 isolate was originally described as B646L genotype XVIII; however, full genome sequencing revealed that this assignment was incorrect.

3.
PLoS One ; 19(3): e0293049, 2024.
Article in English | MEDLINE | ID: mdl-38512923

ABSTRACT

African swine fever (ASF) is a devastating disease of domestic pigs that has spread across the globe since its introduction into Georgia in 2007. The etiological agent is a large double-stranded DNA virus with a genome of 170 to 180 kb in length depending on the isolate. Much of the differences in genome length between isolates are due to variations in the copy number of five different multigene families that are encoded in repetitive regions that are towards the termini of the covalently closed ends of the genome. Molecular epidemiology of African swine fever virus (ASFV) is primarily based on Sanger sequencing of a few conserved and variable regions, but due to the stability of the dsDNA genome changes in the variable regions occur relatively slowly. Observations in Europe and Asia have shown that changes in other genetic loci can occur and that this could be useful in molecular tracking. ASFV has been circulating in Western Africa for at least forty years. It is therefore reasonable to assume that changes may have accumulated in regions of the genome other than the standard targets over the years. At present only one full genome sequence is available for an isolate from Western Africa, that of a highly virulent isolate collected from Benin during an outbreak in 1997. In Cameroon, ASFV was first reported in 1981 and outbreaks have been reported to the present day and is considered endemic. Here we report three full genome sequences from Cameroon isolates of 1982, 1994 and 2018 outbreaks and identify novel single nucleotide polymorphisms and insertion-deletions that may prove useful for molecular epidemiology studies in Western Africa and beyond.


Subject(s)
African Swine Fever Virus , African Swine Fever , Swine , Animals , African Swine Fever/epidemiology , Cameroon/epidemiology , Sus scrofa/genetics , Sequence Analysis , Sequence Analysis, DNA
4.
J Virol ; 97(3): e0003823, 2023 03 30.
Article in English | MEDLINE | ID: mdl-36779761

ABSTRACT

Coronaviruses infect a wide variety of host species, resulting in a range of diseases in both humans and animals. The coronavirus genome consists of a large positive-sense single-stranded molecule of RNA containing many RNA structures. One structure, denoted s2m and consisting of 41 nucleotides, is located within the 3' untranslated region (3' UTR) and is shared between some coronavirus species, including infectious bronchitis virus (IBV), severe acute respiratory syndrome coronavirus (SARS-CoV), and SARS-CoV-2, as well as other pathogens, including human astrovirus. Using a reverse genetic system to generate recombinant viruses, we investigated the requirement of the s2m structure in the replication of IBV, a globally distributed economically important Gammacoronavirus that infects poultry causing respiratory disease. Deletion of three nucleotides predicted to destabilize the canonical structure of the s2m or the deletion of the nucleotides corresponding to s2m impacted viral replication in vitro. In vitro passaging of the recombinant IBV with the s2m sequence deleted resulted in a 36-nucleotide insertion in place of the deletion, which was identified to be composed of a duplication of flanking sequences. A similar result was observed following serial passage of human astrovirus with a deleted s2m sequence. RNA modeling indicated that deletion of the nucleotides corresponding to the s2m impacted other RNA structures present in the IBV 3' UTR. Our results indicated for both IBV and human astrovirus a preference for nucleotide occupation in the genome location corresponding to the s2m, which is independent of the specific s2m sequence. IMPORTANCE Coronaviruses infect many species, including humans and animals, with substantial effects on livestock, particularly with respect to poultry. The coronavirus RNA genome consists of structural elements involved in viral replication whose roles are poorly understood. We investigated the requirement of the RNA structural element s2m in the replication of the Gammacoronavirus infectious bronchitis virus, an economically important viral pathogen of poultry. Using reverse genetics to generate recombinant IBVs with either a disrupted or deleted s2m, we showed that the s2m is not required for viral replication in cell culture; however, replication is decreased in tracheal tissue, suggesting a role for the s2m in the natural host. Passaging of these viruses as well as human astrovirus lacking the s2m sequence demonstrated a preference for nucleotide occupation, independent of the s2m sequence. RNA modeling suggested deletion of the s2m may negatively impact other essential RNA structures.


Subject(s)
Infectious bronchitis virus , Mamastrovirus , Mutagenesis, Insertional , Animals , Humans , 3' Untranslated Regions/genetics , Chickens/virology , Infectious bronchitis virus/genetics , Mamastrovirus/genetics , Mutagenesis, Insertional/genetics , Poultry Diseases/virology , RNA, Viral/genetics , Virus Replication/genetics , RNA Stability/genetics , Sequence Deletion/genetics
5.
BMC Genomics ; 23(1): 406, 2022 May 30.
Article in English | MEDLINE | ID: mdl-35644636

ABSTRACT

BACKGROUND: Non-targeted whole genome sequencing is a powerful tool to comprehensively identify constituents of microbial communities in a sample. There is no need to direct the analysis to any identification before sequencing which can decrease the introduction of bias and false negatives results. It also allows the assessment of genetic aberrations in the genome (e.g., single nucleotide variants, deletions, insertions and copy number variants) including in noncoding protein regions. METHODS: The performance of four different random priming amplification methods to recover RNA viral genetic material of SARS-CoV-2 were compared in this study. In method 1 (H-P) the reverse transcriptase (RT) step was performed with random hexamers whereas in methods 2-4 RT incorporating an octamer primer with a known tag. In methods 1 and 2 (K-P) sequencing was applied on material derived from the RT-PCR step, whereas in methods 3 (SISPA) and 4 (S-P) an additional amplification was incorporated before sequencing. RESULTS: The SISPA method was the most effective and efficient method for non-targeted/random priming whole genome sequencing of SARS-CoV-2 that we tested. The SISPA method described in this study allowed for whole genome assembly of SARS-CoV-2 and influenza A(H1N1)pdm09 in mixed samples. We determined the limit of detection and characterization of SARS-CoV-2 virus which was 103 pfu/ml (Ct, 22.4) for whole genome assembly and 101 pfu/ml (Ct, 30) for metagenomics detection. CONCLUSIONS: The SISPA method is predominantly useful for obtaining genome sequences from RNA viruses or investigating complex clinical samples as no prior sequence information is needed. It might be applied to monitor genomic virus changes, virus evolution and can be used for fast metagenomics detection or to assess the general picture of different pathogens within the sample.


Subject(s)
COVID-19 , Influenza A Virus, H1N1 Subtype , RNA Viruses , Genome, Viral , Humans , SARS-CoV-2/genetics , Whole Genome Sequencing
6.
Viruses ; 14(3)2022 03 16.
Article in English | MEDLINE | ID: mdl-35337028

ABSTRACT

Foot-and-mouth disease (FMD) is endemic in large parts of sub-Saharan Africa, Asia and South America, where outbreaks in cloven-hooved livestock threaten food security and have severe economic impacts. Vaccination in endemic regions remains the most effective control strategy. Current FMD vaccines are produced from chemically inactivated foot-and-mouth disease virus (FMDV) grown in suspension cultures of baby hamster kidney 21 cells (BHK-21). Strain diversity means vaccines produced from one subtype may not fully protect against circulating disparate subtypes, necessitating the development of new vaccine strains that "antigenically match". However, some viruses have proven difficult to adapt to cell culture, slowing the manufacturing process, reducing vaccine yield and limiting the availability of effective vaccines, as well as potentiating the selection of undesired antigenic changes. To circumvent the need to cell culture adapt FMDV, we have used a systematic approach to develop recombinant suspension BHK-21 that stably express the key FMDV receptor integrin αvß6. We show that αvß6 expression is retained at consistently high levels as a mixed cell population and as a clonal cell line. Following exposure to field strains of FMDV, these recombinant BHK-21 facilitated higher virus yields compared to both parental and control BHK-21, whilst demonstrating comparable growth kinetics. The presented data supports the application of these recombinant αvß6-expressing BHK-21 in future FMD vaccine production.


Subject(s)
Foot-and-Mouth Disease Virus , Foot-and-Mouth Disease , Viral Vaccines , Animals , Cell Line , Foot-and-Mouth Disease Virus/genetics , Vaccination , Viral Vaccines/genetics
7.
Oncogene ; 40(47): 6479-6493, 2021 11.
Article in English | MEDLINE | ID: mdl-34611310

ABSTRACT

Androgen receptor (AR) plays a central role in driving prostate cancer (PCa) progression. How AR promotes this process is still not completely clear. Herein, we used single-cell transcriptome analysis to reconstruct the transcriptional network of AR in PCa. Our work shows AR directly regulates a set of signature genes in the ER-to-Golgi protein vesicle-mediated transport pathway. The expression of these genes is required for maximum androgen-dependent ER-to-Golgi trafficking, cell growth, and survival. Our analyses also reveal the signature genes are associated with PCa progression and prognosis. Moreover, we find inhibition of the ER-to-Golgi transport process with a small molecule enhanced antiandrogen-mediated tumor suppression of hormone-sensitive and insensitive PCa. Finally, we demonstrate AR collaborates with CREB3L2 in mediating ER-to-Golgi trafficking in PCa. In summary, our findings uncover a critical role for dysregulation of ER-to-Golgi trafficking expression and function in PCa progression, provide detailed mechanistic insights for how AR tightly controls this process, and highlight the prospect of targeting the ER-to-Golgi pathway as a therapeutic strategy for advanced PCa.


Subject(s)
Androgens/pharmacology , Basic-Leucine Zipper Transcription Factors/metabolism , Endoplasmic Reticulum/pathology , Gene Expression Regulation, Neoplastic/drug effects , Golgi Apparatus/pathology , Prostatic Neoplasms/pathology , Receptors, Androgen/metabolism , Animals , Apoptosis , Basic-Leucine Zipper Transcription Factors/genetics , Biomarkers, Tumor/genetics , Biomarkers, Tumor/metabolism , Cell Proliferation , Endoplasmic Reticulum/drug effects , Endoplasmic Reticulum/metabolism , Gene Regulatory Networks , Golgi Apparatus/drug effects , Golgi Apparatus/metabolism , Humans , Male , Mice , Prognosis , Prostatic Neoplasms/drug therapy , Prostatic Neoplasms/genetics , Prostatic Neoplasms/metabolism , Receptors, Androgen/genetics , Single-Cell Analysis/methods , Survival Rate , Transcriptome , Tumor Cells, Cultured , Xenograft Model Antitumor Assays
8.
Animals (Basel) ; 11(10)2021 Oct 15.
Article in English | MEDLINE | ID: mdl-34679994

ABSTRACT

Peste des petits ruminants virus (PPRV) causes a highly devastating disease of sheep and goats that threatens food security, small ruminant production and susceptible endangered wild ruminants. With policy directed towards achieving global PPR eradication, the establishment of cost-effective genomic surveillance tools is critical where PPR is endemic. Genomic data can provide sufficient in-depth information to identify the pockets of endemicity responsible for PPRV persistence and viral evolution, and direct an appropriate vaccination response. Yet, access to the required sequencing technology is low in resource-limited settings and is compounded by the difficulty of transporting clinical samples from wildlife across international borders due to the Convention on International Trade in Endangered Species (CITES) of Wild Fauna and Flora, and Nagoya Protocol regulations. Oxford nanopore MinION sequencing technology has recently demonstrated an extraordinary performance in the sequencing of PPRV due to its rapidity, utility in endemic countries and comparatively low cost per sample when compared to other whole-genome (WGS) sequencing platforms. In the present study, Oxford nanopore MinION sequencing was utilised to generate complete genomes of PPRV isolates collected from infected goats in Ngorongoro and Momba districts in the northern and southern highlands of Tanzania during 2016 and 2018, respectively. The tiling multiplex polymerase chain reaction (PCR) was carried out with twenty-five pairs of long-read primers. The resulting PCR amplicons were used for nanopore library preparation and sequencing. The analysis of output data was complete genomes of PPRV, produced within four hours of sequencing (accession numbers: MW960272 and MZ322753). Phylogenetic analysis of the complete genomes revealed a high nucleotide identity, between 96.19 and 99.24% with lineage III PPRV currently circulating in East Africa, indicating a common origin. The Oxford nanopore MinION sequencer can be deployed to overcome diagnostic and surveillance challenges in the PPR Global Control and Eradication program. However, the coverage depth was uneven across the genome and amplicon dropout was observed mainly in the GC-rich region between the matrix (M) and fusion (F) genes of PPRV. Thus, larger field studies are needed to allow the collection of sufficient data to assess the robustness of nanopore sequencing technology.

9.
Viruses ; 13(9)2021 09 16.
Article in English | MEDLINE | ID: mdl-34578428

ABSTRACT

Many viruses that cause serious diseases in humans and animals, including the betacoronaviruses (beta-CoVs), such as SARS-CoV, MERS-CoV, and the recently identified SARS-CoV-2, have natural reservoirs in bats. Because these viruses rely entirely on the host cellular machinery for survival, their evolution is likely to be guided by the link between the codon usage of the virus and that of its host. As a result, specific cellular microenvironments of the diverse hosts and/or host tissues imprint peculiar molecular signatures in virus genomes. Our study is aimed at deciphering some of these signatures. Using a variety of genetic methods we demonstrated that trends in codon usage across chiroptera-hosted CoVs are collaboratively driven by geographically different host-species and temporal-spatial distribution. We not only found that chiroptera-hosted CoVs are the ancestors of SARS-CoV-2, but we also revealed that SARS-CoV-2 has the codon usage characteristics similar to those seen in CoVs infecting the Rhinolophus sp. Surprisingly, the envelope gene of beta-CoVs infecting Rhinolophus sp., including SARS-CoV-2, had extremely high CpG levels, which appears to be an evolutionarily conserved trait. The dissection of the furin cleavage site of various CoVs infecting hosts revealed host-specific preferences for arginine codons; however, arginine is encoded by a wider variety of synonymous codons in the murine CoV (MHV-A59) furin cleavage site. Our findings also highlight the latent diversity of CoVs in mammals that has yet to be fully explored.


Subject(s)
Chiroptera/virology , Codon Usage , Coronavirus/genetics , Evolution, Molecular , Animals , Furin/metabolism , Genetic Variation , Genome, Viral
10.
Brief Bioinform ; 22(5)2021 09 02.
Article in English | MEDLINE | ID: mdl-33866372

ABSTRACT

Intrinsically disordered regions/proteins (IDRs) are abundant across all the domains of life, where they perform important regulatory roles and supplement the biological functions of structured proteins/regions (SRs). Despite the multifunctionality features of IDRs, several interrogations on the evolution of viral genomic regions encoding IDRs in diverse viral proteins remain unreciprocated. To fill this gap, we benchmarked the findings of two most widely used and reliable intrinsic disorder prediction algorithms (IUPred2A and ESpritz) to a dataset of 6108 reference viral proteomes to unravel the multifaceted evolutionary forces that shape the codon usage in the viral genomic regions encoding for IDRs and SRs. We found persuasive evidence that the natural selection predominantly governs the evolution of codon usage in regions encoding IDRs by most of the viruses. In addition, we confirm not only that codon usage in regions encoding IDRs is less optimized for the protein synthesis machinery (transfer RNAs pool) of their host than for those encoding SRs, but also that the selective constraints imposed by codon bias sustain this reduced optimization in IDRs. Our analysis also establishes that IDRs in viruses are likely to tolerate more translational errors than SRs. All these findings hold true, irrespective of the disorder prediction algorithms used to classify IDRs. In conclusion, our study offers a novel perspective on the evolution of viral IDRs and the evolutionary adaptability to multiple taxonomically divergent hosts.


Subject(s)
Codon Usage/genetics , Evolution, Molecular , Genome, Viral/genetics , Intrinsically Disordered Proteins/genetics , Viral Proteins/genetics , Algorithms , Computational Biology/methods , CpG Islands/genetics , Intrinsically Disordered Proteins/metabolism , Mutation , Protein Biosynthesis/genetics , Protein Processing, Post-Translational , Proteome/genetics , Proteome/metabolism , Proteomics/methods , RNA, Transfer/genetics , RNA, Transfer/metabolism , Reproducibility of Results , Selection, Genetic , Viral Proteins/metabolism
11.
J Proteome Res ; 20(5): 2704-2713, 2021 05 07.
Article in English | MEDLINE | ID: mdl-33719450

ABSTRACT

Much of our understanding of proteins and proteomes comes from the traditional protein structure-function paradigm. However, in the last 2 decades, both computational and experimental studies have provided evidence that a large fraction of functional proteomes across different domains of life consists of intrinsically disordered proteins, thus triggering a quest to unravel and decipher protein intrinsic disorder. Unlike structured/ordered proteins, intrinsically disordered proteins/regions (IDPs/IDRs) do not possess a well-defined structure under physiological conditions and exist as highly dynamic conformational ensembles. In spite of this peculiarity, these proteins have crucial roles in cell signaling and regulation. To date, studies on the abundance and function of IDPs/IDRs in viruses are rather limited. To fill this gap, we carried out an extensive and thorough bioinformatics analysis of 283 000 proteins from 6108 reference viral proteomes. We analyzed protein intrinsic disorder from multiple perspectives, such as abundance of IDPs/IDRs across diverse virus types, their functional annotations, and subcellular localization in taxonomically divergent hosts. We show that the content of IDPs/IDRs in viral proteomes varies broadly as a function of virus genome types and taxonomically divergent hosts. We have combined the two most commonly used and accurate IDP predictors' results with charge-hydropathy (CH) versus cumulative distribution function (CDF) plots to categorize the viral proteins according to their IDR content and physicochemical properties. Mapping of gene ontology on the disorder content of viral proteins reveals that IDPs are primarily involved in key virus-host interactions and host antiviral immune response downregulation, which are reinforced by the post-translational modifications tied to disorder-enriched viral proteins. The present study offers detailed insights into the prevalence of the intrinsic disorder in viral proteomes and provides appealing targets for the design of novel therapeutics.


Subject(s)
Intrinsically Disordered Proteins , Proteome , Intrinsically Disordered Proteins/genetics , Intrinsically Disordered Proteins/metabolism , Penetrance , Protein Conformation , Protein Processing, Post-Translational , Proteome/genetics , Viral Proteins/genetics
12.
J Gen Virol ; 101(10): 1103-1118, 2020 10.
Article in English | MEDLINE | ID: mdl-32720890

ABSTRACT

Coronavirus sub-genomic mRNA (sgmRNA) synthesis occurs via a process of discontinuous transcription involving complementary transcription regulatory sequences (TRSs), one (TRS-L) encompassing the leader sequence of the 5' untranslated region (UTR), and the other upstream of each structural and accessory gene (TRS-B). Several coronaviruses have an ORF located between the N gene and the 3'-UTR, an area previously thought to be non-coding in the Gammacoronavirus infectious bronchitis virus (IBV) due to a lack of a canonical TRS-B. Here, we identify a non-canonical TRS-B allowing for a novel sgmRNA relating to this ORF to be produced in several strains of IBV: Beaudette, CR88, H120, D1466, Italy-02 and QX. Interestingly, the potential protein produced by this ORF is prematurely truncated in the Beaudette strain. A single nucleotide deletion was made in the Beaudette strain allowing for the generation of a recombinant IBV (rIBV) that had the potential to express a full-length protein. Assessment of this rIBV in vitro demonstrated that restoration of the full-length potential protein had no effect on viral replication. Further assessment of the Beaudette-derived RNA identified a second non-canonically transcribed sgmRNA located within gene 2. Deep sequencing analysis of allantoic fluid from Beaudette-infected embryonated eggs confirmed the presence of both the newly identified non-canonically transcribed sgmRNAs and highlighted the potential for further yet unidentified sgmRNAs. This HiSeq data, alongside the confirmation of non-canonically transcribed sgmRNAs, indicates the potential of the coronavirus genome to encode a larger repertoire of genes than has currently been identified.


Subject(s)
Infectious bronchitis virus/genetics , RNA, Messenger/genetics , RNA, Viral/genetics , Regulatory Sequences, Nucleic Acid/genetics , Transcription, Genetic/genetics , 5' Untranslated Regions/genetics , Animals , Base Sequence , Cell Line , Chickens , Chlorocebus aethiops , Coronavirus Infections/veterinary , Coronavirus Infections/virology , Open Reading Frames/genetics , Poultry Diseases/virology , Vero Cells , Viral Proteins/genetics , Viral Proteins/metabolism , Virus Replication/genetics
13.
Microbiol Resour Announc ; 9(10)2020 Mar 05.
Article in English | MEDLINE | ID: mdl-32139561

ABSTRACT

The full genome sequences of two isolates of bluetongue virus (BTV) from a commercial sheeppox vaccine were determined. Strain SPvvvv/02 shows low sequence identity to its closest relative, strain BTV-26 KUW2010/02, indicating the probable detection of a novel BTV genotype, whereas strain SPvvvv/03 shows high sequence identity to strain BTV-28/1537/14.

14.
J Transl Med ; 17(1): 273, 2019 08 20.
Article in English | MEDLINE | ID: mdl-31429776

ABSTRACT

BACKGROUND: Hepatocellular carcinoma is the second most deadly cancer with late presentation and limited treatment options, highlighting an urgent need to better understand HCC to facilitate the identification of early-stage biomarkers and uncover therapeutic targets for the development of novel therapies for HCC. METHODS: Deep transcriptome sequencing of tumor and paired non-tumor liver tissues was performed to comprehensively evaluate the profiles of both the host and HBV transcripts in HCC patients. Differential gene expression patterns and the dys-regulated genes associated with clinical outcomes were analyzed. Somatic mutations were identified from the sequencing data and the deleterious mutations were predicted. Lastly, human-HBV chimeric transcripts were identified, and their distribution, potential function and expression association were analyzed. RESULTS: Expression profiling identified the significantly upregulated TP73 as a nodal molecule modulating expression of apoptotic genes. Approximately 2.5% of dysregulated genes significantly correlated with HCC clinical characteristics. Of the 110 identified genes, those involved in post-translational modification, cell division and/or transcriptional regulation were upregulated, while those involved in redox reactions were downregulated in tumors of patients with poor prognosis. Mutation signature analysis identified that somatic mutations in HCC tumors were mainly non-synonymous, frequently affecting genes in the micro-environment and cancer pathways. Recurrent mutations occur mainly in ribosomal genes. The most frequently mutated genes were generally associated with a poorer clinical prognosis. Lastly, transcriptome sequencing suggest that HBV replication in the tumors of HCC patients is rare. HBV-human fusion transcripts are a common observation, with favored HBV and host insertion sites being the HBx C-terminus and gene introns (in tumors) and introns/intergenic-regions (in non-tumors), respectively. HBV-fused genes in tumors were mainly involved in RNA binding while those in non-tumors tissues varied widely. These observations suggest that while HBV may integrate randomly during chronic infection, selective expression of functional chimeric transcripts may occur during tumorigenesis. CONCLUSIONS: Transcriptome sequencing of HCC patients reveals key cancer molecules and clinically relevant pathways deregulated/mutated in HCC patients and suggests that while HBV may integrate randomly during chronic infection, selective expression of functional chimeric transcripts likely occur during the process of tumorigenesis.


Subject(s)
Carcinoma, Hepatocellular/genetics , Gene Expression Profiling , Liver Neoplasms/genetics , Transcriptome/genetics , Base Sequence , Cell Cycle/genetics , Chromosomes, Human/genetics , Gene Expression Regulation, Neoplastic , Genome, Viral , Hepatitis B virus/genetics , Humans , Introns/genetics , Male , Mutation/genetics , Open Reading Frames/genetics , RNA, Messenger/genetics , RNA, Messenger/metabolism , Repetitive Sequences, Nucleic Acid , Survival Analysis , Trans-Activators/genetics , Viral Regulatory and Accessory Proteins
15.
Genes (Basel) ; 10(8)2019 07 25.
Article in English | MEDLINE | ID: mdl-31349684

ABSTRACT

Current high-throughput sequencing technologies can generate sequence data and provide information on the genetic composition of samples at very high coverage. Deep sequencing approaches enable the detection of rare variants in heterogeneous samples, such as viral quasi-species, but also have the undesired effect of amplifying sequencing errors and artefacts. Distinguishing real variants from such noise is not straightforward. Variant callers that can handle pooled samples can be in trouble at extremely high read depths, while at lower depths sensitivity is often sacrificed to specificity. In this paper, we propose SiNPle (Simplified Inference of Novel Polymorphisms from Large coveragE), a fast and effective software for variant calling. SiNPle is based on a simplified Bayesian approach to compute the posterior probability that a variant is not generated by sequencing errors or PCR artefacts. The Bayesian model takes into consideration individual base qualities as well as their distribution, the baseline error rates during both the sequencing and the PCR stage, the prior distribution of variant frequencies and their strandedness. Our approach leads to an approximate but extremely fast computation of posterior probabilities even for very high coverage data, since the expression for the posterior distribution is a simple analytical formula in terms of summary statistics for the variants appearing at each site in the genome. These statistics can be used to filter out putative SNPs and indels according to the required level of sensitivity. We tested SiNPle on several simulated and real-life viral datasets to show that it is faster and more sensitive than existing methods. The source code for SiNPle is freely available to download and compile, or as a Conda/Bioconda package.


Subject(s)
Genotyping Techniques/methods , High-Throughput Nucleotide Sequencing/methods , Polymorphism, Single Nucleotide , Sequence Analysis, DNA/methods , Software , DNA, Viral/genetics , Genotyping Techniques/standards , High-Throughput Nucleotide Sequencing/standards , Sensitivity and Specificity , Sequence Analysis, DNA/standards
16.
Infect Genet Evol ; 74: 103931, 2019 10.
Article in English | MEDLINE | ID: mdl-31238112

ABSTRACT

Epizootic hemorrhagic disease virus (EHDV) is a Culicoides-transmitted orbivirus that infects domestic and wild ruminants in many parts of the world. Of the eight proposed serotypes, only EHDV-1, 2 and 6 have been reported to be present in the Americas. Following the identification of a virulent EHD-6 reasssortant virus in the USA in 2007 (EHDV-6 Indiana), with outer coat protein segments derived from an Australian strain of EHDV and all remaining segments derived from a locally circulating EHDV-2 strain, questions have remained about the origin of the Australian parent strain and how it may have arrived in the USA. When EHDV-6 was identified in asymptomatic cattle imported into the Caribbean island of Trinidad in 2013, full genome sequencing was carried out to further characterise the virus. The EHDV-6 Trinidad was a reassortant virus, with 8 of its 10 segments, being derived from the same exotic Australian EHDV-6 strain as the VP2 and VP5 present in the EHDV-6 Indiana strain from the USA. Analyses of the two remaining segments revealed that segment 8 showed the highest nucleotide identity (90.4%) with a USA New Jersey strain of EHDV-1, whereas segment 4 had the highest nucleotide identity (96.5%) with an Australian EHDV-2 strain. This data strongly suggests that the Trinidad EHDV-6 has an Australian origin, receiving its segment 4 from a reassortment event with an EHDV-2 also from Australia. This reassortant virus likely came to the Americas, where it received its segment 8 from a locally-circulating (as yet unknown) EHDV strain. This virus then may have gained entry into the USA, where it further reassorted with a known locally-circulating EHDV-2, the resulting strain being EHDV-6 Indiana. This study therefore identifies, for the first time, the likely minor parent virus of the EHDV-6 currently circulating in the USA.


Subject(s)
Cattle Diseases/virology , Hemorrhagic Disease Virus, Epizootic/classification , Reoviridae Infections/veterinary , Whole Genome Sequencing/methods , Animals , Australia , Cattle , Genome, Viral , Hemorrhagic Disease Virus, Epizootic/genetics , Hemorrhagic Disease Virus, Epizootic/isolation & purification , Phylogeny , Reassortant Viruses/classification , Reassortant Viruses/genetics , Reassortant Viruses/isolation & purification , Trinidad and Tobago , United States
17.
BMC Bioinformatics ; 18(1): 435, 2017 Oct 02.
Article in English | MEDLINE | ID: mdl-28969593

ABSTRACT

BACKGROUND: There are a large number of biological databases publicly available for scientists in the web. Also, there are many private databases generated in the course of research projects. These databases are in a wide variety of formats. Web standards have evolved in the recent times and semantic web technologies are now available to interconnect diverse and heterogeneous sources of data. Therefore, integration and querying of biological databases can be facilitated by techniques used in semantic web. Heterogeneous databases can be converted into Resource Description Format (RDF) and queried using SPARQL language. Searching for exact queries in these databases is trivial. However, exploratory searches need customized solutions, especially when multiple databases are involved. This process is cumbersome and time consuming for those without a sufficient background in computer science. In this context, a search engine facilitating exploratory searches of databases would be of great help to the scientific community. RESULTS: We present BioCarian, an efficient and user-friendly search engine for performing exploratory searches on biological databases. The search engine is an interface for SPARQL queries over RDF databases. We note that many of the databases can be converted to tabular form. We first convert the tabular databases to RDF. The search engine provides a graphical interface based on facets to explore the converted databases. The facet interface is more advanced than conventional facets. It allows complex queries to be constructed, and have additional features like ranking of facet values based on several criteria, visually indicating the relevance of a facet value and presenting the most important facet values when a large number of choices are available. For the advanced users, SPARQL queries can be run directly on the databases. Using this feature, users will be able to incorporate federated searches of SPARQL endpoints. We used the search engine to do an exploratory search on previously published viral integration data and were able to deduce the main conclusions of the original publication. BioCarian is accessible via http://www.biocarian.com . CONCLUSIONS: We have developed a search engine to explore RDF databases that can be used by both novice and advanced users.


Subject(s)
Databases, Factual , Search Engine , Internet , Software
18.
BMC Bioinformatics ; 18(Suppl 3): 71, 2017 Mar 14.
Article in English | MEDLINE | ID: mdl-28361674

ABSTRACT

BACKGROUND: The study of virus integrations in human genome is important since virus integrations were shown to be associated with diseases. In the literature, few methods have been proposed that predict virus integrations using next generation sequencing datasets. Although they work, they are slow and are not very sensitive. RESULTS AND DISCUSSION: This paper introduces a new method BatVI to predict viral integrations. Our method uses a fast screening method to filter out chimeric reads containing possible viral integrations. Next, sensitive alignments of these candidate chimeric reads are called by BLAST. Chimeric reads that are co-localized in the human genome are clustered. Finally, by assembling the chimeric reads in each cluster, high confident virus integration sites are extracted. CONCLUSION: We compared the performance of BatVI with existing methods VirusFinder and VirusSeq using both simulated and real-life datasets of liver cancer patients. BatVI ran an order of magnitude faster and was able to predict almost twice the number of true positives compared to other methods while maintaining a false positive rate less than 1%. For the liver cancer datasets, BatVI uncovered novel integrations to two important genes TERT and MLL4, which were missed by previous studies. Through gene expression data, we verified the correctness of these additional integrations. BatVI can be downloaded from http://biogpu.ddns.comp.nus.edu.sg/~ksung/batvi/index.html .


Subject(s)
Genome, Human , Host-Pathogen Interactions/genetics , Virus Integration , Algorithms , Cluster Analysis , DNA, Viral/genetics , DNA-Binding Proteins/genetics , DNA-Binding Proteins/metabolism , High-Throughput Nucleotide Sequencing , Histone-Lysine N-Methyltransferase , Humans , Liver Neoplasms/diagnosis , Liver Neoplasms/virology , Models, Theoretical , Sequence Analysis, DNA , Software , Telomerase/genetics , Telomerase/metabolism
19.
Nucleic Acids Res ; 43(16): e107, 2015 Sep 18.
Article in English | MEDLINE | ID: mdl-26170239

ABSTRACT

Structural variations (SVs) play a crucial role in genetic diversity. However, the alignments of reads near/across SVs are made inaccurate by the presence of polymorphisms. BatAlign is an algorithm that integrated two strategies called 'Reverse-Alignment' and 'Deep-Scan' to improve the accuracy of read-alignment. In our experiments, BatAlign was able to obtain the highest F-measures in read-alignments on mismatch-aberrant, indel-aberrant, concordantly/discordantly paired and SV-spanning data sets. On real data, the alignments of BatAlign were able to recover 4.3% more PCR-validated SVs with 73.3% less callings. These suggest BatAlign to be effective in detecting SVs and other polymorphic-variants accurately using high-throughput data. BatAlign is publicly available at https://goo.gl/a6phxB.


Subject(s)
Algorithms , Genomic Structural Variation , High-Throughput Nucleotide Sequencing , Sequence Alignment/methods , Base Pair Mismatch , Genome , INDEL Mutation
20.
PLoS One ; 9(9): e106575, 2014.
Article in English | MEDLINE | ID: mdl-25188507

ABSTRACT

ALK is an established causative oncogenic driver in neuroblastoma, and is likely to emerge as a routine biomarker in neuroblastoma diagnostics. At present, the optimal strategy for clinical diagnostic evaluation of ALK protein, genomic and hotspot mutation status is not well-studied. We evaluated ALK immunohistochemical (IHC) protein expression using three different antibodies (ALK1, 5A4 and D5F3 clones), ALK genomic status using single-color chromogenic in situ hybridization (CISH), and ALK hotspot mutation status using conventional Sanger sequencing and a next-generation sequencing platform (Ion Torrent Personal Genome Machine (IT-PGM)), in archival formalin-fixed, paraffin-embedded neuroblastoma samples. We found a significant difference in IHC results using the three different antibodies, with the highest percentage of positive cases seen on D5F3 immunohistochemistry. Correlation with ALK genomic and hotspot mutational status revealed that the majority of D5F3 ALK-positive cases did not possess either ALK genomic amplification or hotspot mutations. Comparison of sequencing platforms showed a perfect correlation between conventional Sanger and IT-PGM sequencing. Our findings suggest that D5F3 immunohistochemistry, single-color CISH and IT-PGM sequencing are suitable assays for evaluation of ALK status in future neuroblastoma clinical trials.


Subject(s)
Neuroblastoma/genetics , Neuroblastoma/metabolism , Receptor Protein-Tyrosine Kinases/metabolism , Anaplastic Lymphoma Kinase , Animals , Child , Child, Preschool , Female , Humans , Immunohistochemistry , In Situ Hybridization , Infant , Infant, Newborn , Mice , Mice, Inbred BALB C , Mutation , Receptor Protein-Tyrosine Kinases/genetics
SELECTION OF CITATIONS
SEARCH DETAIL
...